Goto

Collaborating Authors

 short-term goal



Semantic Visual Navigation by Watching Y ouTube Videos - Supplementary Materials

Neural Information Processing Systems

Our hierarchical policy is motivated by Chaplot et al. [2], and consists Current location is indicated by the hollow circle. High-level policy outputs the most promising direction to pursue as the short-term goal. This process is repeated, i.e., the high-level policy takes feedback from the low-level We describe these two policies in more detail below. Low-level policy continues to re-plan when the occupancy map updates. We assume access to depth images, and adapt code from the map and plan implementation from [4], to implement the low-level policy.


Planning for Success: Exploring LLM Long-term Planning Capabilities in Table Understanding

Nguyen, Thi-Nhung, Ngo, Hoang, Phung, Dinh, Vu, Thuy-Trang, Nguyen, Dat Quoc

arXiv.org Artificial Intelligence

Table understanding is key to addressing challenging downstream tasks such as table-based question answering and fact verification. Recent works have focused on leveraging Chain-of-Thought and question decomposition to solve complex questions requiring multiple operations on tables. However, these methods often suffer from a lack of explicit long-term planning and weak inter-step connections, leading to miss constraints within questions. In this paper, we propose leveraging the long-term planning capabilities of large language models (LLMs) to enhance table understanding. Our approach enables the execution of a long-term plan, where the steps are tightly interconnected and serve the ultimate goal, an aspect that methods based on Chain-of-Thought and question decomposition lack. In addition, our method effectively minimizes the inclusion of unnecessary details in the process of solving the next short-term goals, a limitation of methods based on Chain-of-Thought. Extensive experiments demonstrate that our method outperforms strong baselines and achieves state-of-the-art performance on WikiTableQuestions and TabFact datasets.


Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World

Wu, Guande, Zhao, Chen, Silva, Claudio, He, He

arXiv.org Artificial Intelligence

Language agents that interact with the world on their own have great potential for automating digital tasks. While large language model (LLM) agents have made progress in understanding and executing tasks such as textual games and webpage control, many real-world tasks also require collaboration with humans or other LLMs in equal roles, which involves intent understanding, task coordination, and communication. To test LLM's ability to collaborate, we design a blocks-world environment, where two agents, each having unique goals and skills, build a target structure together. To complete the goals, they can act in the world and communicate in natural language. Under this environment, we design increasingly challenging settings to evaluate different collaboration perspectives, from independent to more complex, dependent tasks. We further adopt chain-of-thought prompts that include intermediate reasoning steps to model the partner's state and identify and correct execution errors. Both human-machine and machine-machine experiments show that LLM agents have strong grounding capacities, and our approach significantly improves the evaluation metric.


Robot Navigation in Unknown and Cluttered Workspace with Dynamical System Modulation in Starshaped Roadmap

Chen, Kai, Liu, Haichao, Li, Yulin, Duan, Jianghua, Zhu, Lei, Ma, Jun

arXiv.org Artificial Intelligence

This paper presents a novel reactive motion planning framework for navigating robots in unknown and cluttered 2D workspace. Typical existing methods are developed by enforcing the robot staying in free regions represented by the locally extracted ellipse or polygon. Instead, we navigate the robot in free space with an alternate starshaped decomposition, which is calculated directly from real-time sensor data. Additionally, a roadmap is constructed incrementally to maintain the connectivity information of the starshaped regions. Compared to the roadmap built upon connected polygons or ellipses in the conventional approaches, the concave starshaped region is better suited to capture the natural distribution of sensor data, so that the perception information can be fully exploited for robot navigation. In this sense, conservative and myopic behaviors are avoided with the proposed approach, and intricate obstacle configurations can be suitably accommodated in unknown and cluttered environments. Then, we design a heuristic exploration algorithm on the roadmap to determine the frontier points of the starshaped regions, from which short-term goals are selected to attract the robot towards the goal configuration. It is noteworthy that, a recovery mechanism is developed on the roadmap that is triggered once a non-extendable short-term goal is reached. This mechanism renders it possible to deal with dead-end situations that can be typically encountered in unknown and cluttered environments. Furthermore, safe and smooth motion within the starshaped regions is generated by employing the Dynamical System Modulation (DSM) approach on the constructed roadmap. Through comprehensive evaluation in both simulations and real-world experiments, the proposed method outperforms the benchmark methods in terms of success rate and traveling time.


Deep Reinforcement Learning with Adjustments

Khorasgani, Hamed, Wang, Haiyan, Gupta, Chetan, Serita, Susumu

arXiv.org Artificial Intelligence

Deep reinforcement learning (RL) algorithms can learn complex policies to optimize agent operation over time. RL algorithms have shown promising results in solving complicated problems in recent years. However, their application on real-world physical systems remains limited. Despite the advancements in RL algorithms, the industries often prefer traditional control strategies. Traditional methods are simple, computationally efficient and easy to adjust. In this paper, we first propose a new Q-learning algorithm for continuous action space, which can bridge the control and RL algorithms and bring us the best of both worlds. Our method can learn complex policies to achieve long-term goals and at the same time it can be easily adjusted to address short-term requirements without retraining. Next, we present an approximation of our algorithm which can be applied to address short-term requirements of any pre-trained RL algorithm. The case studies demonstrate that both our proposed method as well as its practical approximation can achieve short-term and long-term goals without complex reward functions.


Semantic Visual Navigation by Watching YouTube Videos

Chang, Matthew, Gupta, Arjun, Gupta, Saurabh

arXiv.org Artificial Intelligence

Semantic cues and statistical regularities in real-world environment layouts can improve efficiency for navigation in novel environments. This paper learns and leverages such semantic cues for navigating to objects of interest in novel environments, by simply watching YouTube videos. This is challenging because YouTube videos don't come with labels for actions or goals, and may not even showcase optimal behavior. Our proposed method tackles these challenges through the use of Q-learning on pseudo-labeled transition quadruples (image, action, next image, reward). We show that such off-policy Q-learning from passive data is able to learn meaningful semantic cues for navigation. These cues, when used in a hierarchical navigation policy, lead to improved efficiency at the ObjectGoal task in visually realistic simulations. We improve upon end-to-end RL methods by 66%, while using 250x fewer interactions. Code, data, and models will be made available.


Learning to Explore using Active Neural SLAM

Chaplot, Devendra Singh, Gandhi, Dhiraj, Gupta, Saurabh, Gupta, Abhinav, Salakhutdinov, Ruslan

arXiv.org Artificial Intelligence

This work presents a modular and hierarchical approach to learn policies for exploring 3D environments, called `Active Neural SLAM'. Our approach leverages the strengths of both classical and learning-based methods, by using analytical path planners with learned SLAM module, and global and local policies. The use of learning provides flexibility with respect to input modalities (in the SLAM module), leverages structural regularities of the world (in global policies), and provides robustness to errors in state estimation (in local policies). Such use of learning within each module retains its benefits, while at the same time, hierarchical decomposition and modular training allow us to sidestep the high sample complexities associated with training end-to-end policies. Our experiments in visually and physically realistic simulated 3D environments demonstrate the effectiveness of our approach over past learning and geometry-based approaches. The proposed model can also be easily transferred to the PointGoal task and was the winning entry of the CVPR 2019 Habitat PointGoal Navigation Challenge.


Management Model in the age of AI

#artificialintelligence

I think it has become a cliché now that "digital transformation leads to new business models". In reality, new business models are hard to come by, and even if you chance upon something new and compelling, it is extremely difficult to protect your innovative idea because competitors have become more adept at responding to such innovations quickly. Companies are therefore on the lookout for new forms of competitive advantage that are enduring, sustainable, hard to copy and valuable. Most businesses derive their business model following Peter Drucker's "theory of the business": the organisation's positioning with respect to the environment (market, governments, society), the organisation's mission with respect to deliver something valuable and relevant to the environment, and the capabilities needed to not only accomplish the organisation's mission but also to establish sustainable business growth over a long period of time. Having an idea about these aspects of business model is a good starting point, and for sure, will give you the answers to the "what?" and the "why?" of your business.